HOME | Setting up the IVs | Downloading occurence data | Fitting SDMs

Documentation: Download occurence data from GBIF

Get species list

This function produces a list of species that we will later use to harvest occurence data from gbif. More on this later.

source("./R/spList.R")
mySpecies <- sl()
head(mySpecies)
## [1] "Botrychium lanceolatum"  "Comastoma tenellum"     
## [3] "Gentianella campestris"  "Kobresia simpliciuscula"
## [5] "Primula scandinavica"    "Pseudorchis albida"

Occurence data

To get occurence data I will use the gbif function in the dismo package. It can only handle one species at the time, so I will need to make a for-loop.

head(mySpecies)
## [1] "Botrychium lanceolatum"  "Comastoma tenellum"     
## [3] "Gentianella campestris"  "Kobresia simpliciuscula"
## [5] "Primula scandinavica"    "Pseudorchis albida"

This is my species list with correct spelling (direct from ADB). To test the functions I will use a shorter list of 10 species.

mySpecies2 <- mySpecies[1:10]

Test run

Let’s do a test loop without downloading anything, just seeing how many records there are.

nOccurences_df <- data.frame(species = mySpecies2, 
                  nOccurences = as.numeric(NA))

for(i in 1:length(mySpecies2)){
  myName  <- mySpecies2[i]
  myName2 <- stringr::str_split(myName, " ")[[1]]
  nOccurences_df$nOccurences[i] <- dismo::gbif(myName2[1], myName2[2], download = F) 
}
## Loading required namespace: jsonlite
nOccurences_df
##                    species nOccurences
## 1   Botrychium lanceolatum        5053
## 2       Comastoma tenellum        6321
## 3   Gentianella campestris       53391
## 4  Kobresia simpliciuscula        3392
## 5     Primula scandinavica         321
## 6       Pseudorchis albida       20958
## 7      Pulsatilla vernalis       23679
## 8    Buglossoides arvensis       42215
## 9       Anisantha sterilis      113335
## 10       Sorbus lancifolia         101

This shows some of the bias in the occurence data. For example that B arvensis, a super rare plant found almost only on Hovedøya, an island outside of Oslo, has 37k records, whereas P. scandinavica, a relatively common plant, has 321. However, I’m pretty sure occurences within the same 1x1 km cell will only count as one (duplicates removed by the sdm function.)

For the next part I will use two species with a quite low number of records to reduce processing time and test potentias problems due to low sample sizes.

mySpecies3 <- mySpecies[mySpecies == c("Primula scandinavica", "Kobresia simpliciuscula")]
# write.csv(mySpecies3, 'data/species_list.csv')

For fun. lets see what these plants look like.

list.files("./figures/plants")

Alt text Picture: Kobresia simpliciuscula (Andrey Zharkikh CC-BY 2.0)

Alt text Picture: Primula scandinavica (Anders Kolstad CC-BY 4.0)

(Note: The picture sizes are 250p and 400p, respectively)

Download

For real this time:

for(i in 1:length(mySpecies3)){
  myName  <- mySpecies3[i]
  myName2 <- stringr::str_split(myName, " ")[[1]]
  
  assign(
    sub(' ', '_', mySpecies3[i]), 
         dismo::gbif(myName2[1], myName2[2], 
                                        download = T,
                                        geo = T, 
                                        sp = F) 
  )
}
## 3392 records found
## 0-300-600-900-1200-1500-1800-2100-2400-2700-3000-3300-3392 records downloaded
## 321 records found
## 0-300-321 records downloaded

Two new dataframes are put in the environment. They have a lot of columns to start with, so lets get rid of som to make the objects smaller. I only need the species names and the coordinates (perhaps some more, but I can add those later).

qc <- data.frame(Species = mySpecies3,
                 lon_is_NA =                        as.numeric(NA),
                 lat_NA_when_lon_not          =     as.numeric(NA),
                 lon_is_zero =                      as.numeric(NA),
                 lat_zero_when_lon_not = as.numeric(NA))

for(i in 1:length(mySpecies3)){
  
  
  d <- get(
           sub(' ', '_', mySpecies3[i]))
  d <- d[,c("species","lat","lon")]
  
  # remove spaces in names (it clogs up the sdm function)
  d$species <- sub(' ', '_', d$species)
  
  # remove NA's
  w1 <- which(is.na(d$lon))
  if(length(w1) != 0) d <- d[-w1,]
  w2 <- which(is.na(d$lat))
  if(length(w2) != 0) d <- d[-w2,]
  
  # remove those with coordinates equal to zero
  w3 <- which(d$lon == 0)
  if(length(w3) != 0) d <- d[-w3,]
  w4 <- which(d$lat == 0)
  if(length(w4) != 0) d <- d[-w4,]
  
  assign(
      sub(' ', '_', mySpecies3[i]),  d)
  
  qc[i,2] <- length(w1)
  qc[i,3] <- length(w2)
  qc[i,4] <- length(w3)
  qc[i,5] <- length(w4)
  
}

A dataframe called qc tells us what has happened.

qc
##                   Species lon_is_NA lat_NA_when_lon_not lon_is_zero
## 1 Kobresia simpliciuscula       786                   0           0
## 2    Primula scandinavica       172                   0           0
##   lat_zero_when_lon_not
## 1                     0
## 2                     0

We have cut 787 rows from the Kobresia and 172 from Primula, due to missing coordinates.

Now we can turn the dataframes into spatialPointsDataFrames, define the CRS, and plot the points. The dataset comes as lonlat.

for(i in 1:length(mySpecies3)){
  
  d <- get(
           sub(' ', '_', mySpecies3[i]))
  
  sp::coordinates(d) <- ~lon + lat
  sp::proj4string(d) <- sp::proj4string(raster::raster())
  
  assign(
      sub(' ', '_', mySpecies3[i]),  d)
}
mapview::mapview(Kobresia_simpliciuscula, 
                 map.types = c("Esri.WorldShadedRelief",
                               "Esri.WorldImagery"),
                 cex = 5, lwd = 0,
                 alpha.regions = 0.5,
                 col.regions = "blue")
mapview::mapview(Primula_scandinavica, 
                 map.types = c("Esri.WorldShadedRelief",
                               "Esri.WorldImagery"),
                 cex = 5, lwd = 0,
                 alpha.regions = 0.5,
                 col.regions = "blue")

First, notice that Kobresia is called Carex in GBIF, but Kobresia in ADB. This is not a problem and they are recognised as synonyms. The Kobresia is a widespread species, whereas the Primula is endemic to Norway and Sweden. We only need the points that fall on Norway. First we need something to clip against, so we’ll get an outline of Norway.

# outline <- norway()
#saveRDS(outline, "data/large/outline_Norway.RData") # 1.8MB
outline <- readRDS("data/large/outline_Norway.RData")
raster::plot(outline)

Now to clip away occurences outside this polygon (can take a few minutes)

for(i in 1:length(mySpecies3)){
  
  d <- get(
           sub(' ', '_', mySpecies3[i]))
  
  d <- raster::crop(d, outline)
  
  assign(
      sub(' ', '_', mySpecies3[i]),  d)
}

Let’s see it it worked.

mapview::mapview(Kobresia_simpliciuscula, 
                 map.types = c("Esri.WorldShadedRelief",
                               "Esri.WorldImagery"),
                 cex = 5, lwd = 0,
                 alpha.regions = 0.5,
                 col.regions = "blue")
mapview::mapview(Primula_scandinavica, 
                 map.types = c("Esri.WorldShadedRelief",
                               "Esri.WorldImagery"),
                 cex = 5, lwd = 0,
                 alpha.regions = 0.5,
                 col.regions = "blue")

Looks like it. Now we just need to get this over to UTM32 to match the IV data, and save it on file.

for(i in 1:length(mySpecies3)){
  
  d <- get(
           sub(' ', '_', mySpecies3[i]))
  
  d <- sp::spTransform(d, myIVs[[1]]@crs)
  
  assign(
      sub(' ', '_', mySpecies3[i]),  d)
}

oDat <- get(sub(' ', '_', mySpecies3[1]))
for(i in 2:length(mySpecies3)){
  oDat <- rbind(oDat, get(sub(' ', '_', mySpecies3[i])))
  }
saveRDS(oDat, 'data/large/oDat.RData')
rm(oDat)
oDat <- readRDS('data/large/oDat.RData')
myIVs              <- raster:: stack('data/IV.grd')
raster::plot(myIVs$Forest_Productivity)
raster::plot(oDat,add=T)

Check sample sizes

Let’s see how mny point there are for each species, and how many of these that fall on the same 1x1 km grid cells.

t <- myIVs$elev
df <- data.frame(species = NA,
                points = as.numeric(NA),
                unique = as.numeric(NA))
for(i in 1:length(unique(oDat$species))){
  s       <- unique(oDat$species)[i]
  df[i,1] <- paste(s)
  t1      <- oDat[oDat$species == s,]
  df[i,2] <- length(t1)
  u       <- raster::rasterize(t1, t, 1, fun = "count")
  df[i,3] <- length(u[u>0])
}
df
##                species points unique
## 1 Carex_simpliciuscula   1188    811
## 2 Primula_scandinavica    120     57

Not very good for the Primula, and this species was also much harder to model.